Video quality measurement considering multiple artifacts

ABSTRACT

A particular implementation determines objective distortion levels (d,) respectively for a plurality of artifact types. The objective distortion levels are aligned to ensure that the same distortion level of different types of artifacts corresponds to the same perceived distortion level. The aligned distortion levels (d,′) are sorted to obtain sorted distortion levels (d,″). The sorted distortion levels are then pooled together into an overall distortion level or an overall quality metric. The sorted distortion levels may be pooled using a weighted sum, wherein the weight is larger when the sorted distortion level is greater.

TECHNICAL FIELD

This invention relates to video quality measurement, and moreparticularly, to a method and apparatus for determining an overall videoquality metric in response to multiple artifacts.

BACKGROUND

Video quality losses may be caused by various events, for example, bylossy compression and transmission errors and they may be perceived byhuman eyes as various types of visual artifacts. For example,blockiness, ringing, and blurriness are typical artifacts caused bylossy compression.

On the other hand, different types of artifacts may be perceived whenthe video quality is degraded by transmission errors. For example, whena packet loss is detected at the transport layer, a decoder may applyerror concealment in order to reduce the strength of visual artifacts.Artifacts may still be perceived after error concealment, and we denotethe remaining artifacts as channel artifacts. In another example, when areference frame is entirely lost, a decoder may freeze decoding andrepeats the previously correctly decoded picture until a frame withoutreferring to the lost frame is correctly received, thus causing a visualpause. We denote such a visual pause as a freezing artifact. Thefreezing artifact may also be caused by buffer underflow. For example,when there is a network delay, a frame may not be available yet at ascheduled display time (i.e., the buffer underflows) and the displaypauses until the frame becomes available.

SUMMARY

According to a general aspect, picture data including a plurality ofartifact types are accessed. Aligned distortion levels are sorted toobtain sorted distortion levels, wherein each of the aligned distortionlevels corresponds to a respective one of the plurality of artifacttypes, and wherein a particular value of the aligned distortion levelscorresponds to a respective perceived distortion level. An overalldistortion level is determined in response to the sorted distortionlevels, wherein a greater sorted distortion level has a greater impacton the overall distortion level.

According to another general aspect, picture data including a pluralityof artifact types are accessed. Respective objective distortion levelsare determined for the plurality of artifact types. The objectivedistortion levels are aligned to obtain the aligned distortion levels,wherein each of the aligned distortion levels corresponds to arespective one of the plurality of artifact types, and wherein aparticular value of the aligned distortion levels corresponds to arespective perceived distortion level. The aligned distortion levels aresorted to obtain sorted distortion levels. An overall distortion levelis determined as a weighted sum of the sorted distortion levels, whereina first weight for a first sorted distortion level is greater than asecond weight for a second sorted distortion level if the first sorteddistortion level is greater than the second sorted distortion level.

The details of one or more implementations are set forth in theaccompanying drawings and the description below. Even if described inone particular manner, it should be clear that implementations may beconfigured or embodied in various manners. For example, animplementation may be performed as a method, or embodied as anapparatus, such as, for example, an apparatus configured to perform aset of operations or an apparatus storing instructions for performing aset of operations, or embodied in a signal. Other aspects and featureswill become apparent from the following detailed description consideredin conjunction with the accompanying drawings and the claims.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a flow diagram depicting an example for generating an overallvideo distortion level in response to multiple artifact types, inaccordance with an embodiment of the present principles.

FIG. 2 is a pictorial example depicting how an objective distortionlevel corresponds to a subjective distortion level for three artifacttypes.

FIG. 3 is a block diagram depicting an example of a video qualitymonitor, in accordance with an embodiment of the present principles.

FIG. 4 is a block diagram depicting an example of a video processingsystem that may be used with one or more implementations.

DETAILED DESCRIPTION

When multiple types of visual artifacts are present in a video, theartifact strength for an individual type of artifact, namely an artifactlevel or a distortion level, may be measured by a variety of methods.

The artifact strength may be ranked by subjective viewing tests, whichare generally best but time consuming. In the present application, theartifact strength or distortion level ranked manually (for example,through subjective viewing tests) is denoted as the perceived distortionlevel or the subjective distortion level.

The artifact strength may also be estimated by a variety of algorithmsaiming to predict the perceived distortion level. For example, existingartifact detection methods for measuring blockiness, ringing,blurriness, and freezing artifacts may be used to provide the distortionlevels. In the present application, the distortion level estimated byalgorithms is denoted as the distortion level, the estimated distortionlevel, or the objective distortion level.

The present principles estimate an overall distortion level or anoverall quality metric in response to distortion levels from a varietyof artifacts. Mathematically, the estimation can be described as:

D=f(d ₁ ,d ₂ , . . . , d _(m)),

where m is the number of artifact types under consideration, d_(i), i=1,. . . , m, is the estimated distortion level for artifact type i, and Dis the overall distortion level to be estimated. In one embodiment, theoverall distortion level D may be converted to an overall quality metricQ.

FIG. 1 illustrates an exemplary method 100 that determines an overalldistortion level based on distortion levels of individual artifacttypes. At step 110, distortion levels can be determined respectively forindividual artifact types. For example, when m types of artifacts areconsidered, a distortion level, denoted as d_(i), can be determined forthe i^(th) type of artifact, where i=1, . . . , m.

At step 120, the distortion levels are aligned. The step of distortionlevel alignment is to ensure that the distortion levels are adjusted sothat the same distortion level of different types of artifactscorresponds to the same perceived distortion level. For ease ofnotation, the adjusted distortion level for d_(i) is denoted as d_(i)′,and the mapping process from d_(i) to d_(i)′ is denoted as a functionh_(i)( ). That is, the distortion level alignment process may bemathematically denoted as d_(i)′=h_(i)(d_(i)), where i=1, . . . , m.

The aligned distortion levels, d_(i)′, i=1, . . . , m, are then sorted,for example, in a descending order or in an ascending order, at step130. The sorted distortion levels can be denoted as d_(i)″, i=1, . . . ,m. When they are sorted in a descending order, d₁″ ≧. . . ≧d_(m)″, andwhen they are sorted in an ascending order, d₁″≦ . . . ≦d_(m)″.

Using the sorted distortion levels, an overall distortion level or aquality metric may be estimated through a pooling strategy at step 140.In the following, the steps of distortion level alignment (120) andoverall distortion level determination are discussed in further detail.

Distortion Level Alignment

FIG. 2 illustrates exemplary relationships between the objectivedistortion levels (d_(i)) and the subjective distortion levels. In thisexample, the distortion levels are within the range of (0, 1), where 0corresponds to no distortion (i.e., the best quality) and 1 correspondsto the highest distortion level (i.e., the worst quality). In otherembodiments, the distortion levels can be scaled or shifted to otherranges, for example, to (1, 5) or to (1, 100).

In FIG. 2, the horizontal axis represents the objective distortionlevel, and the vertical axis represents the subjective distortion level.Lines 210, 220, and 230 correspond to three types of artifactsrespectively. For an objective distortion level represented by line 240,all three artifact types are measured at the same objective distortionlevel, but they correspond to different subjective distortion levels.Thus, values of estimated distortion levels (d₁, d₂, and d₃) are notcomparable subjectively, and may not be used to compare the subjectivedistortion levels of different artifact types. For example, when d₁>d₂,the perceived annoyance of the first artifact type may not necessarilybe stronger than that of the second artifact type. In another example,when d₁=d₂, the perceived annoyance of the first and second artifacttypes may not be identical.

The purpose of distortion level alignment is to make the distortionlevels comparable. That is, to adjust the distortion levels so that thesame distortion level of different artifact types corresponds to thesame subjective distortion level. As discussed before, the alignmentprocess for the i^(th) type of artifact may be denoted mathematically asd_(i)′=h_(i)(d_(i)).

To derive the function h_(i)( ), a curve fitting method may be used. Inone embodiment, the function h_(i)( ) may be defined as a third-orderpolynomial function:

h _(i)(d _(i))=β_(1,i) ×d _(i) ³+β_(2,i) ×d _(i) ²+β_(3,i) ×d_(i)+β_(4,i),

where β_(1,i), β_(2,i), β_(3,i), and β_(4,i) are model parameters, whichmay be trained by subjective datasets.

In other embodiments, the function h_(i)( ) may be defined as apolynomial functions at other orders, or it may be an exponentialfunction, or a logarithmic function.

Overall Distortion Level Determination

After the alignment and sorting steps, the perceived artifact strengthis controlled by the value of d_(i)″, regardless of its correspondingartifact type. For example, assuming videos V₁ and V₂ are affected bytwo types of artifacts: compression artifacts and channel artifacts,consider the following two exemplary scenarios:

1. In video V₁, the compression artifact is measured at d₁′ after thealignment step, and the channel artifact is measured at d₂′ (d₁′>d₂′,that is, the compression artifact is stronger than the channelartifact); and

2. In video V₂, the channel artifact is measured at d₁′ and thecompression artifact is d₂′. That is, the channel artifact in video V₂is at the same distortion level as that of the compression artifact invideo V₁, and the compression artifact in video V₂ is at the samedistortion level as that of the channel artifact in video V₁.

After sorting, for example, in a descending order, d₁″=d₁′ and d₂″=d₂′for both videos V₁ and V₂. Since the overall distortion level isestimated based on the sorted distortion levels (d_(i)″), it would bethe same for both V₁ and V₂, even the distortion levels for individualartifact types are different.

Thus, after the alignment and sorting steps, the overall distortionlevel may be predicted from the sorted objective distortion levelswithout considering the corresponding artifact types. Consequently, theproblem of combining the distortion levels of multiple artifact typescan be simplified.

Mathematically, the problem of pooling multiple sorted distortion levelsinto one overall distortion level can be denoted as

D=g(d ₁ ^(, . . . , d) ^(—) ^(m̂)).

Linear or non-linear functions may be used to represent the function g() and various training methods can be used to obtain model parametersfor the function g( ) In one embodiment, a weighted sum is used:

D=g(d _(↓)1^(↑) ″, . . . , d _(↓) m ^(↑)″)=α_(↓)1×d _(↓)1^(↑)″+α_(↓)2×d_(↓)2^(↑)″+ . . . +α_(↓) m×d _(↓) m ^(↑)″+α_(↓)(m+1),  (1)

where α₂, i=1, . . . , m+1, are model parameters for the function g( )and may be determined by a training process. In other embodiments, othermethods, for example, a learning machine, a support vector machine(SVM), or an artificial neural network (ANN) may be used.

It is observed from our experiments that human eyes usually pays moreattention to the strongest artifacts and evaluate the quality ordistortion level of the video mainly based on these strongest artifacts.In addition, it is observed that the weaker the artifact is, the lessimpact is has on human perception. Thus, model parameter α_(i)>α_(j) ifd_(i)″>d_(j)″.

In one embodiment, assuming d₁″ represents the strongest artifact,consequently α₁ is greater than α_(i), i=≠1. To speed up thecomputation, we may approximate Eq. (1) with

D=g(d _(↓)1^(↑) ″, . . . d _(↓) m ^(↑)″)=α_(↓)1×d_(↓)1^(↑)″+α_(↓)2  .(2)

In other embodiments, we may choose to consider only the first fewstrongest artifacts.

As discussed in the alignment and sorting steps, training processes maybe needed to obtain model parameters (for example, β_(j,i) and α_(i)).I_(n) the following, using the compression artifact, channel artifact,and freezing artifact as three exemplary artifact types, the trainingprocesses are discussed.

Firstly, video with different artifacts are created and a trainingdataset may be generated by including:

1. videos affected by coding artifacts only;

2. videos affected by channel artifacts only;

3. videos affected by freezing artifacts only; and

4. videos affected by all three artifact types.

This training dataset are to be used by both the alignment and poolingsteps, where the first three types of videos are used by the alignmentstep and the fourth type of videos is used by the pooling step. Notethat the training dataset should include all types of artifact typesunder consideration.

Secondly, subjective viewing tests can be performed over the trainingdataset to provide subjective distortion levels for individual videos.The subjective distortion levels can be denoted as d_(s,j), j=1, . . . ,N, where N is the number of videos in the training dataset.

To obtain parameters for the alignment step, objective distortion levels(d_(j), j=1, . . . , N) can be obtained for individual videos, forexample, using detection schemes for coding artifacts, channelartifacts, and freezing artifacts. After obtaining the subjectivedistortion levels and objective distortion levels, a curve fittingmethod, for example, a least mean square error (LMSE) fitting method,may be used to determine the model parameters β_(1,i), β_(2,i), β_(3,i),and β_(4,i).

To obtain parameters for the pooling step, the sorted distortion level(d₁″) can be obtained for the fourth type of videos (i.e., videos thatcontain all three artifact types), for example, using Eq. (1). Using thesubjective distortion levels and corresponding sorted distortion levels(d_(i)″), the model parameters α_(i) may be calculated. For example,using one of ITU-T P.NBAMS datasets, where sample videos containcompression artifacts and freezing artifacts, the function g( ) istrained as:

g(d′ ₁ ^(d) ^(—) ^(2̂))=1.39×d ₁ ^(+0.25×d) ^(—) ^(2̂)|2.00.

The overall distortion level can be converted into an overall qualitymetric. In general, the higher the distortion level is, the lower thequality metric should be.

One advantage of the present distortion level estimation method is thatit is independent of the distortion set. That is, the same estimationsteps may be used when different types of artifacts are considered. Forexample, when the model parameters may be determined based on thecompression artifacts, channel artifacts, and freezing artifacts, thesame model parameters for the pooling step may be used when another setof artifacts (for example, blockiness, ringing, blurriness) isconsidered.

FIG. 3 depicts a block diagram of an exemplary video quality monitor300. The input of apparatus 300 may include a transport stream thatcontains the bitstream. The input may be in other formats that containsthe bitstream. The input may also be decoded videos with or withouterror concealment.

Artifact detector 310 estimates objective distortion levels forindividual artifact types, at a bitstream level (i.e., the video is notreconstructed) or at a pixel level (i.e., the video is reconstructed).Distortion level generator 320 estimates an overall distortion level,for example, using method 100. Quality predictor 330 maps the overalldistortion level into a quality score.

Referring to FIG. 4, a video transmission system or apparatus 400 isshown, to which the features and principles described above may beapplied. A processor 405 processes the video and the encoder 410 encodesthe video. The bitstream generated from the encoder is transmitted to adecoder 430 through a distribution network 420. A video quality monitormay be used at different stages.

In one embodiment, a video quality monitor 440 may be used by a contentcreator. For example, the estimated video quality may be used by anencoder in deciding encoding parameters, such as mode decision or bitrate allocation. In another example, after the video is encoded, thecontent creator uses the video quality monitor to monitor the quality ofencoded video. If the quality metric does not meet a pre-defined qualitylevel, the content creator may choose to re-encode the video to improvethe video quality. The content creator may also rank the encoded videobased on the quality and charges the content accordingly.

In another embodiment, a video quality monitor 450 may be used by acontent distributor. A video quality monitor may be placed in thedistribution network. The video quality monitor calculates the qualitymetrics and reports them to the content distributor. Based on thefeedback from the video quality monitor, a content distributor mayimprove its service by adjusting bandwidth allocation and accesscontrol.

The content distributor may also send the feedback to the contentcreator to adjust encoding. Note that improving encoding quality at theencoder may not necessarily improve the quality at the decoder sidesince a high quality encoded video usually requires more bandwidth andleaves less bandwidth for transmission protection. Thus, to reach anoptimal quality at the decoder, a balance between the encoding bitrateand the bandwidth for channel protection should be considered.

In another embodiment, a video quality monitor 460 may be used by a userdevice. For example, when a user device searches videos in Internet, asearch result may return many videos or many links to videoscorresponding to the requested video content. The videos in the searchresults may have different quality levels. A video quality monitor cancalculate quality metrics for these videos and decide to select whichvideo to store. In another example, the user may have access to severalerror concealment techniques. A video quality monitor can calculatequality metrics for different error concealment techniques andautomatically choose which concealment technique to use based on thecalculated quality metrics.

The implementations described herein may be implemented in, for example,a method or a process, an apparatus, a software program, a data stream,or a signal. Even if only discussed in the context of a single form ofimplementation (for example, discussed only as a method), theimplementation of features discussed may also be implemented in otherforms (for example, an apparatus or program). An apparatus may beimplemented in, for example, appropriate hardware, software, andfirmware. The methods may be implemented in, for example, an apparatussuch as, for example, a processor, which refers to processing devices ingeneral, including, for example, a computer, a microprocessor, anintegrated circuit, or a programmable logic device. Processors alsoinclude communication devices, such as, for example, computers, cellphones, portable/personal digital assistants (“PDAs”), and other devicesthat facilitate communication of information between end-users.

Implementations of the various processes and features described hereinmay be embodied in a variety of different equipment or applications,particularly, for example, equipment or applications associated withdata encoding, data decoding, distortion measurement, quality measuring,and quality monitoring. Examples of such equipment include an encoder, adecoder, a post-processor processing output from a decoder, apre-processor providing input to an encoder, a video coder, a videodecoder, a video codec, a web server, a set-top box, a laptop, apersonal computer, a cell phone, a PDA, a game console, and othercommunication devices. As should be clear, the equipment may be mobileand even installed in a mobile vehicle.

Additionally, the methods may be implemented by instructions beingperformed by a processor, and such instructions (and/or data valuesproduced by an implementation) may be stored on a processor-readablemedium such as, for example, an integrated circuit, a software carrieror other storage device such as, for example, a hard disk, a compactdiskette (“CD”), an optical disc (such as, for example, a DVD, oftenreferred to as a digital versatile disc or a digital video disc), arandom access memory (“RAM”), or a read-only memory (“ROM”). Theinstructions may form an application program tangibly embodied on aprocessor-readable medium. Instructions may be, for example, inhardware, firmware, software, or a combination. Instructions may befound in, for example, an operating system, a separate application, or acombination of the two. A processor may be characterized, therefore, as,for example, both a device configured to carry out a process and adevice that includes a processor-readable medium (such as a storagedevice) having instructions for carrying out a process. Further, aprocessor-readable medium may store, in addition to or in lieu ofinstructions, data values produced by an implementation.

As will be evident to one of skill in the art, implementations mayproduce a variety of signals formatted to carry information that may be,for example, stored or transmitted. The information may include, forexample, instructions for performing a method, or data produced by oneof the described implementations. For example, a signal may be formattedto carry as data the rules for writing or reading the syntax of adescribed embodiment, or to carry as data the actual syntax-valueswritten by a described embodiment. Such a signal may be formatted, forexample, as an electromagnetic wave (for example, using a radiofrequency portion of spectrum) or as a baseband signal. The formattingmay include, for example, encoding a data stream and modulating acarrier with the encoded data stream. The information that the signalcarries may be, for example, analog or digital information. The signalmay be transmitted over a variety of different wired or wireless links,as is known. The signal may be stored on a processor-readable medium.

A number of implementations have been described. Nevertheless, it willbe understood that various modifications may be made. For example,elements of different implementations may be combined, supplemented,modified, or removed to produce other implementations. Additionally, oneof ordinary skill will understand that other structures and processesmay be substituted for those disclosed and the resulting implementationswill perform at least substantially the same function(s), in at leastsubstantially the same way(s), to achieve at least substantially thesame result(s) as the implementations disclosed. Accordingly, these andother implementations are contemplated by this application.

1. A method, comprising: accessing picture data including a plurality ofartifact types; sorting aligned distortion levels to obtain sorteddistortion levels, wherein each of the aligned distortion levelscorresponds to a respective one of the plurality of artifact types, andwherein a particular value of the aligned distortion levels correspondsto a respective perceived distortion level; and determining an overalldistortion level in response to the sorted distortion levels, wherein agreater sorted distortion level has a greater impact on the overalldistortion level.
 2. The method of claim 1, wherein the overalldistortion level is determined in response to a subset of the sorteddistortion levels.
 3. The method of claim 1, wherein the overalldistortion level is determined in response to a weighted sum of thesorted distortion levels.
 4. The method of claim 3, wherein a firstweight for a first sorted distortion level is greater than a secondweight for a second sorted distortion level if the first sorteddistortion level is greater than the second sorted distortion level. 5.The method of claim 1, further comprising: determining a quality metricin response to the overall distortion level.
 6. The method of claim 1,further comprising: determining respective objective distortion levelsfor the plurality of artifact types; and aligning the objectivedistortion levels to obtain the aligned distortion levels.
 7. The methodof claim 6, wherein the respective objective distortion levels for theplurality of artifact types are determined at a bitstream level.
 8. Themethod of claim 6, wherein the aligning the objective distortion levelsis performed as a polynomial function.
 9. The method of claim 1, whereinthe plurality of artifact types include at least one of compressionartifacts, channel artifacts, freezing artifacts, blockiness, ringing,and blurriness.
 10. An apparatus, comprising: a distortion levelgenerator sorting aligned distortion levels to obtain sorted distortionlevels, wherein each of the aligned distortion levels corresponds to arespective one of a plurality of artifact types, and wherein aparticular value of the aligned distortion levels corresponds to arespective perceived distortion level, and determining an overalldistortion level in response to the sorted distortion levels, wherein agreater sorted distortion level has a greater impact on the overalldistortion level.
 11. The apparatus of claim 10, wherein the distortionlevel generator determines the overall distortion level in response to asubset of the sorted distortion levels.
 12. The apparatus of claim 10,wherein the distortion level generator determines the overall distortionlevel in response to a weighted sum of the sorted distortion levels. 13.The apparatus of claim 12, wherein a first weight for a first sorteddistortion level is greater than a second weight for a second sorteddistortion level if the first sorted distortion level is greater thanthe second sorted distortion level.
 14. The apparatus of claim 10,further comprising a quality predictor determining a quality metric inresponse to the overall distortion level.
 15. The apparatus of claim 10,further comprising: an artifact detector determining respectiveobjective distortion levels for the plurality of artifact types, whereinthe distortion level generator aligns the objective distortion levels toobtain the aligned distortion levels.
 16. The apparatus of claim 15,wherein the artifact detector determines respective objective distortionlevels for the plurality of artifact types at a bitstream level.
 17. Theapparatus of claim 10, wherein the plurality of artifact types includeat least one of compression artifacts, channel artifacts, freezingartifacts, blockiness, ringing, and blurriness.
 18. A processor readablemedium having stored thereupon instructions for causing one or moreprocessors to collectively perform: accessing picture data including aplurality of artifact types; sorting aligned distortion levels to obtainsorted distortion levels, wherein each of the aligned distortion levelscorresponds to a respective one of the plurality of artifact types, andwherein a particular value of the aligned distortion levels correspondsto a respective perceived distortion level; and determining an overalldistortion level in response to the sorted distortion levels, wherein agreater sorted distortion level has a greater impact on the overalldistortion level.